Method of Measuring Adaptive Immunity

ABSTRACT

A method of measuring immunocompetence is described. This method provides a means for assessing the effects of diseases or conditions that compromise the immune system and of therapies aimed to reconstitute it. This method is based on quantifying T-cell diversity by calculating the number of diverse T-cell receptor (TCR) beta chain variable regions from blood cells.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser.No. 12/794,507, filed on Jun. 4, 2010, which claims the benefit of U.S.Provisional Application No. 61/220,344, filed on Jun. 25, 2009, whichare hereby incorporated by reference in their entirety.

TECHNICAL FIELD

What is described is a method to measure the adaptive immunity of apatient by analyzing the diversity of T cell receptor genes or antibodygenes using large scale sequencing of nucleic acid extracted fromadaptive immune system cells.

BACKGROUND

Immunocompetence is the ability of the body to produce a normal immuneresponse (i.e., antibody production and/or cell-mediated immunity)following exposure to a pathogen, which might be a live organism (suchas a bacterium or fungus), a virus, or specific antigenic componentsisolated from a pathogen and introduced in a vaccine. Immunocompetenceis the opposite of immunodeficiency or immuno-incompetent orimmunocompromised. Several examples would be a newborn that does not yethave a fully functioning immune system but may have maternallytransmitted antibody (immunodeficient); a late stage AIDS patient with afailed or failing immune system (immuno-incompetent); a transplantrecipient taking medication so their body will not reject the donatedorgan (immunocompromised); age-related attenuation of T cell function inthe elderly; or individuals exposed to radiation or chemotherapeuticdrugs. There may be cases of overlap but these terms are all indicatorsof a dysfunctional immune system. In reference to lymphocytes,immunocompetence means that a B cell or T cell is mature and canrecognize antigens and allow a person to mount an immune response.

Immunocompetence depends on the ability of the adaptive immune system tomount an immune response specific for any potential foreign antigens,using the highly polymorphic receptors encoded by B cells(immunoglobulins, Igs) and T cells (T cell receptors, TCRs).

Igs expressed by B cells are proteins consisting of four polypeptidechains, two heavy chains (H chains) and two light chains (L chains),forming an H₂L₂ structure. Each pair of H and L chains contains ahypervariable domain, consisting of a V_(L) and a V_(H) region, and aconstant domain. The H chains of Igs are of several types, ν, δ, γ, α,and β. The diversity of Igs within an individual is mainly determined bythe hypervariable domain. The V domain of H chains is created by thecombinatorial joining of three types of germline gene segments, theV_(H), D_(H), and J_(H) segments. Hypervariable domain sequencediversity is further increased by independent addition and deletion ofnucleotides at the V_(H)-D_(H), D_(H)-J_(H), and V_(H)-J_(H) junctionsduring the process of Ig gene rearrangement. In this respect,immunocompetence is reflected in the diversity of Igs.

TCRs expressed by αβ T cells are proteins consisting of twotransmembrane polypeptide chains (α and β), expressed from the TCRA andTCRB genes, respectively. Similar TCR proteins are expressed ingamma-delta T cells, from the TCRD and TCRG loci. Each TCR peptidecontains variable complementarity determining regions (CDRs), as well asframework regions (FRs) and a constant region. The sequence diversity ofαβ T cells is largely determined by the amino acid sequence of the thirdcomplementarity-determining region (CDR3) loops of the α and β chainvariable domains, which diversity is a result of recombination betweenvariable (V_(β)), diversity (D_(β)), and joining (J_(β)) gene segmentsin the β chain locus, and between analogous V_(α) and J_(α) genesegments in the α chain locus, respectively. The existence of multiplesuch gene segments in the TCR α and β chain loci allows for a largenumber of distinct CDR3 sequences to be encoded. CDR3 sequence diversityis further increased by independent addition and deletion of nucleotidesat the V_(β)-D_(β), D_(β)-J_(β), and V_(α)-J_(α) junctions during theprocess of TCR gene rearrangement. In this respect, immunocompetence isreflected in the diversity of TCRs.

There exists a long-felt need for methods of assessing or measuring theadaptive immune system of patients in a variety of settings, whetherimmunocompetence in the immunocompromised, or dysregulated adaptiveimmunity in autoimmune disease. A demand exists for methods ofdiagnosing a disease state or the effects of aging by assessing theimmunocompetence of a patient. In the same way results of therapies thatmodify the immune system need to be monitored by assessing theimmunocompetence of the patient while undergoing the treatment.Conversely, a demand exists for methods to monitor the adaptive immunesystem in the context of autoimmune disease flares and remissions, inorder to monitor response to therapy, or the need to initiateprophylactic therapy pre-symptomatically.

SUMMARY

One aspect of the invention is composition comprising:

-   -   a multiplicity of V-segment primers, wherein each primer        comprises a sequence that is complementary to a single        functional V segment or a small family of V segments; and    -   a multiplicity of J-segment primers, wherein each primer        comprises a sequence that is complementary to a J segment;        wherein the V segment and J-segment primers permit amplification        of a TCR CDR3 region by a multiplex polymerase chain reaction        (PCR) to produce a multiplicity of amplified DNA molecules        sufficient to quantify the diversity of the TCR genes. One        embodiment of the invention is the composition, wherein each        V-segment primer comprises a sequence that is complementary to a        single Vβ segment, and each J segment primer comprises a        sequence that is complementary to a Jβ segment, and wherein V        segment and J-segment primers permit amplification of a TCRβ        CDR3 region. Another embodiment is the composition, wherein each        V-segment primer comprises a sequence that is complementary to a        single functional Vα segment, and each J segment primer        comprises a sequence that is complementary to a Jα segment, and        wherein V segment and J-segment primers permit amplification of        a TCRα CDR3 region.

Another embodiment of the invention is the composition, wherein the Vsegment primers hybridize with a conserved segment, and have similarannealing strength. Another embodiment is wherein the V segment primeris anchored at position −43 in the Vβ segment relative to therecombination signal sequence (RSS). Another embodiment is wherein themultiplicity of V segment primers consist of at least 45 primersspecific to 45 different Vβ genes. Another embodiment is wherein the Vsegment primers have sequences that are selected from the groupconsisting of SEQ ID NOS:1-45. Another embodiment is wherein the Vsegment primers have sequences that are selected from the groupconsisting of SEQ ID NOS:58-102. Another embodiment is wherein there isa V segment primer for each Vβ segment.

Another embodiment of the invention is the composition, wherein the Jsegment primers hybridize with a conserved framework region element ofthe Jβ segment, and have similar annealing strength. The composition ofclaim 2, wherein the multiplicity of J segment primers consist of atleast thirteen primers specific to thirteen different Jβ genes. Anotherembodiment is The composition of claim 2, wherein the J segment primershave sequences that are selected from the group consisting of SEQ IDNOS:46-57. Another embodiment is wherein the J segment primers havesequences that are selected from the group consisting of SEQ IDNOS:102-113. Another embodiment is wherein there is a J segment primerfor each Jβ segment. Another embodiment is wherein all J segment primersanneal to the same conserved motif.

Another embodiment of the invention is the composition, wherein theamplified DNA molecule starts from said conserved motif and amplifiesadequate sequence to diagnostically identify the J segment and includesthe CDR3 junction and extends into the V segment. Another embodiment iswherein the amplified JP gene segments each have a unique four base tagat positions +11 through +14 downstream of the RSS site.

Another aspect of the invention is the composition further comprising aset of sequencing oligonucleotides, wherein the sequencingoligonucleotides hybridize to a regions within the amplified DNAmolecules. An embodiment is wherein the sequencing oligonucleotideshybridize adjacent to a four base tag within the amplified Jβ genesegments at positions +11 through +14 downstream of the RSS site.Another embodiment is wherein the sequencing oligonucleotides areselected from the group consisting of SEG ID NOS:58-70. Anotherembodiment is wherein the V-segment or J-segment are selected to containa sequence error-correction by merger of closely related sequences.Another embodiment is the composition, further comprising a universal Csegment primer for generating cDNA from mRNA.

Another aspect of the invention is a composition comprising:

-   -   a multiplicity of V segment primers, wherein each V segment        primer comprises a sequence that is complementary to a single        functional V segment or a small family of V segments; and    -   a multiplicity of J segment primers, wherein each J segment        primer comprises a sequence that is complementary to a J        segment;

wherein the V segment and J segment primers permit amplification of theTCRG CDR3 region by a multiplex polymerase chain reaction (PCR) toproduce a multiplicity of amplified DNA molecules sufficient to quantifythe diversity of antibody heavy chain genes.

Another aspect of the invention is a composition comprising:

-   -   a multiplicity of V segment primers, wherein each V segment        primer comprises a sequence that is complementary to a single        functional V segment or a small family of V segments; and    -   a multiplicity of J segment primers, wherein each J segment        primer comprises a sequence that is complementary to a J        segment;

wherein the V segment and J segment primers permit amplification ofantibody heavy chain (IGH) CDR3 region by a multiplex polymerase chainreaction (PCR) to produce a multiplicity of amplified DNA moleculessufficient to quantify the diversity of antibody heavy chain genes.

Another aspect of the invention is a composition comprising:

-   -   a multiplicity of V segment primers, wherein each V segment        primer comprises a sequence that is complementary to a single        functional V segment or a small family of V segments; and    -   a multiplicity of J segment primers, wherein each J segment        primer comprises a sequence that is complementary to a J        segment;

wherein the V segment and J segment primers permit amplification ofantibody light chain (IGL) V_(L) region by a multiplex polymerase chainreaction (PCR) to produce a multiplicity of amplified DNA moleculessufficient to quantify the diversity of antibody light chain genes.

Another aspect of the invention is a method comprising:

-   -   selecting a multiplicity of V segment primers, wherein each V        segment primer comprises a sequence that is complementary to a        single functional V segment or a small family of V segments; and    -   selecting a multiplicity of J segment primers, wherein each J        segment primer comprises a sequence that is complementary to a J        segment;    -   combining the V segment and J segment primers with a sample of        genomic DNA to permit amplification of a CDR3 region by a        multiplex polymerase chain reaction (PCR) to produce a        multiplicity of amplified DNA molecules sufficient to quantify        the diversity of the TCR genes.

One embodiment of the invention is the method wherein each V segmentprimer comprises a sequence that is complementary to a single functionalVβ segment, and each J segment primer comprises a sequence that iscomplementary to a Jβ segment; and wherein combining the V segment and Jsegment primers with a sample of genomic DNA permits amplification of aTCR CDR3 region by a multiplex polymerase chain reaction (PCR) andproduces a multiplicity of amplified DNA molecules. Another embodimentis wherein each V segment primer comprises a sequence that iscomplementary to a single functional Vα segment, and each J segmentprimer comprises a sequence that is complementary to a Jα segment; andwherein combining the V segment and J segment primers with a sample ofgenomic DNA permits amplification of a TCR CDR3 region by a multiplexpolymerase chain reaction (PCR) and produces a multiplicity of amplifiedDNA molecules.

Another embodiment of the invention is the method further comprising astep of sequencing the amplified DNA molecules. Another embodiment iswherein the sequencing step utilizes a set of sequencingoligonucleotides, that hybridize to regions within the amplified DNAmolecules. Another embodiment is the method, further comprising a stepof calculating the total diversity of TCRβ CDR3 sequences among theamplified DNA molecules. Another embodiment is wherein the method showsthat the total diversity of a normal human subject is greater than 1*10⁶sequences, greater than 2*10⁶ sequences, or greater than 3*10⁶sequences.

Another aspect of the invention is a method of diagnosingimmunodeficiency in a human patient, comprising measuring the diversityof TCR CDR3 sequences of the patient, and comparing the diversity of thesubject to the diversity obtained from a normal subject. An embodimentof the invention is the method, wherein measuring the diversity of TCRsequences comprises the steps of:

-   -   selecting a multiplicity of V segment primers, wherein each V        segment primer comprises a sequence that is complementary to a        single functional V segment or a small family of V segments; and    -   selecting a multiplicity of J segment primers, wherein each J        segment primer comprises a sequence that is complementary to a J        segment;    -   combining the V segment and J segment primers with a sample of        genomic DNA to permit amplification of a TCR CDR3 region by a        multiplex polymerase chain reaction (PCR) to produce a        multiplicity of amplified DNA molecules;    -   sequencing the amplified DNA molecules;    -   calculating the total diversity of TCR CDR3 sequences among the        amplified DNA molecules.

An embodiment of the invention is the method, wherein comparing thediversity is determined by calculating using the following equation:

${\Delta \; (t)} = {{{\sum\limits_{x}{E\left( n_{x} \right)}_{{{measurement}\; 1} + 2}} - {\sum\limits_{x}{E\left( n_{x} \right)}_{{measurement}\; 2}}} = {S{\int_{0}^{\infty}{{^{- \lambda}\left( {1 - ^{{- \lambda}\; t}} \right)}\ {{G(\lambda)}}}}}}$

wherein G(λ) is the empirical distribution function of the parametersλ₁, . . . , λ_(S), n_(x) is the number of clonotypes sequenced exactly xtimes, and

${E\left( n_{x} \right)} = {S{\int_{0}^{\infty}{\left( \frac{^{- \lambda}\lambda^{x}}{x!} \right)\ {{{G(\lambda)}}.}}}}$

Another embodiment of the invention is the method, wherein the diversityof at least two samples of genomic DNA are compared. Another embodimentis wherein one sample of genomic DNA is from a patient and the othersample is from a normal subject. Another embodiment is wherein onesample of genomic DNA is from a patient before a therapeutic treatmentand the other sample is from the patient after treatment. Anotherembodiment is wherein the two samples of genomic DNA are from the samepatient at different times during treatment. Another embodiment iswherein a disease is diagnosed based on the comparison of diversityamong the samples of genomic DNA. Another embodiment is wherein theimmunocompetence of a human patient is assessed by the comparison.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

The TCR and Ig genes can generate millions of distinct proteins viasomatic mutation. Because of this diversity-generating mechanism, thehypervariable complementarity determining regions of these genes canencode sequences that can interact with millions of ligands, and theseregions are linked to a constant region that can transmit a signal tothe cell indicating binding of the protein's cognate ligand.

The adaptive immune system employs several strategies to generate arepertoire of T- and B-cell antigen receptors with sufficient diversityto recognize the universe of potential pathogens. In αβ and γδ T cells,which primarily recognize peptide antigens presented by MHC molecules,most of this receptor diversity is contained within the thirdcomplementarity-determining region (CDR3) of the T cell receptor (TCR) αand β chains (or γ and δ chains). Although it has been estimated thatthe adaptive immune system can generate up to 10¹⁸ distinct TCR αβpairs, direct experimental assessment of TCR CDR3 diversity has not beenpossible.

What is described herein is a novel method of measuring TCR CDR3diversity that is based on single molecule DNA sequencing, and use thisapproach to sequence the CDR3 regions in millions of rearranged TCRβgenes isolated from peripheral blood T cells of two healthy adults.

The ability of the adaptive immune system to mount an immune responsespecific for any of the vast number of potential foreign antigens towhich an individual might be exposed relies on the highly polymorphicreceptors encoded by B cells (immunoglobulins) and T cells (T cellreceptors; TCRs). The TCRs expressed by αβ T cells, which primarilyrecognize peptide antigens presented by major histocompatibility complex(MHC) class I and II molecules, are heterodimeric proteins consisting oftwo transmembrane polypeptide chains (α and β), each containing onevariable and one constant domain. The peptide specificity of αβ T cellsis in large part determined by the amino acid sequence encoded in thethird complementarity-determining region (CDR3) loops of the α and βchain variable domains. The CDR3 regions of the β and α chains areformed by recombination between noncontiguous variable (V_(β)),diversity (D_(β)), and joining (J_(β)) gene segments in the β chainlocus, and between analogous V_(α) and J_(α) gene segments in the αchain locus, respectively. The existence of multiple such gene segmentsin the TCR α and β chain loci allows for a large number of distinct CDR3sequences to be encoded. CDR3 sequence diversity is further increased bytemplate-independent addition and deletion of nucleotides at theV_(β)-D_(β), D_(β)-J_(β), and V_(α)-J_(α) junctions during the processof TCR gene rearrangement.

Previous attempts to assess the diversity of receptors in the adulthuman αβ T cell repertoire relied on examining rearranged TCR α and βchain genes expressed in small, well-defined subsets of the repertoire,followed by extrapolation of the diversity present in these subsets tothe entire repertoire, to estimate approximately 10⁶ unique TCRβ chainCDR3 sequences per individual, with 10-20% of these unique TCRβ CDR3sequences expressed by cells in the antigen-experienced CD45RO⁺compartment. The accuracy and precision of this estimate is severelylimited by the need to extrapolate the diversity observed in hundreds ofsequences to the entire repertoire, and it is possible that the actualnumber of unique TCRβ chain CDR3 sequences in the αβ T cell repertoireis significantly larger than 1×10⁶.

Recent advances in high-throughput DNA sequencing technology have madepossible significantly deeper sequencing than capillary-basedtechnologies. A complex library of template molecules carrying universalPCR adapter sequences at each end is hybridized to a lawn ofcomplementary oligonucleotides immobilized on a solid surface. Solidphase PCR is utilized to amplify the hybridized library, resulting inmillions of template clusters on the surface, each comprising multiple(˜1,000) identical copies of a single DNA molecule from the originallibrary. A 30-54 bp interval in the molecules in each cluster issequenced using reversible dye-termination chemistry, to permitsimultaneous sequencing from genomic DNA of the rearranged TCRβ chainCDR3 regions carried in millions of T cells. This approach enablesdirect sequencing of a significant fraction of the uniquely rearrangedTCRβ CDR3 regions in populations of αβ T cells, which thereby permitsestimation of the relative frequency of each CDR3 sequence in thepopulation.

Accurate estimation of the diversity of TCRβ CDR3 sequences in theentire αβ T cell repertoire from the diversity measured in a finitesample of T cells requires an estimate of the number of CDR3 sequencespresent in the repertoire that were not observed in the sample. TCRβchain CDR3 diversity in the entire αβ T cell repertoire were estimatedusing direct measurements of the number of unique TCRβ CDR3 sequencesobserved in blood samples containing millions of αβ T cells. The resultsherein identify a lower bound for TCRβ CDR3 diversity in the CD4⁺ andCD8⁺ T cell compartments that is several fold higher than previousestimates. In addition, the results herein demonstrate that there are atleast 1.5×10⁶ unique TCRβ CDR3 sequences in the CD45RO⁺ compartment ofantigen-experienced T-cells, a large proportion of which are present atlow relative frequency. The existence of such a diverse population ofTCRβ CDR3 sequences in antigen-experienced cells has not been previouslydemonstrated.

The diverse pool of TCRβ chains in each healthy individual is a samplefrom an estimated theoretical space of greater than 10¹¹ possiblesequences. However, the realized set of rearranged of TCRs is not evenlysampled from this theoretical space. Different Vβ's and Jβ's are foundwith over a thousand-fold frequency difference. Additionally, theinsertion rates of nucleotides are strongly biased. This reduced spaceof realized TCRβ sequences leads to the possibility of shared β chainsbetween people. With the sequence data generated by the methodsdescribed herein, the in vivo J usage, V usage, mono- and di-nucleotidebiases, and position dependent amino acid usage can be computed. Thesebiases significantly narrow the size of the sequence space from whichTCRβ are selected, suggesting that different individuals share TCRβchains with identical amino acid sequences. Results herein show thatmany thousands of such identical sequences are shared pairwise betweenindividual human genomes.

The assay technology uses two pools of primers to provide for a highlymultiplexed PCR reaction. The “forward” pool has a primer specific toeach V segment in the gene (several primers targeting a highly conservedregion are used, to simultaneously capture many V segments). The“reverse” pool primers anneal to a conserved sequence in the joining(“J”) segment. The amplified segment pool includes adequate sequence toidentify each J segment and also to allow for a J-segment-specificprimer to anneal for resequencing. This enables direct observation of alarge fraction of the somatic rearrangements present in an individual.This in turn enables rapid comparison of the TCR repertoire inindividuals with an autoimmune disorder (or other target diseaseindication) against the TCR repertoire of controls.

The adaptive immune system can in theory generate an enormous diversityof T cell receptor CDR3 sequences—far more than are likely to beexpressed in any one individual at any one time. Previous attempts tomeasure what fraction of this theoretical diversity is actually utilizedin the adult αβ T cell repertoire, however, have not permitted accurateassessment of the diversity. What is described herein is the developmentof a novel approach to this question that is based on single moleculeDNA sequencing and an analytic computational approach to estimation ofrepertoire diversity using diversity measurements in finite samples. Theanalysis demonstrated that the number of unique TCRβ CDR3 sequences inthe adult repertoire significantly exceeds previous estimates based onexhaustive capillary sequencing of small segments of the repertoire. TheTCRβ chain diversity in the CD45RO⁻ population (enriched for naïve Tcells) observed using the methods described herein is five-fold largerthan previously reported. A major discovery is the number of unique TCRβCDR3 sequences expressed in antigen-experienced CD45RO⁺ T cells—theresults herein show that this number is between 10 and 20 times largerthan expected based on previous results of others. The frequencydistribution of CDR3 sequences in CD45RO⁺ cells suggests that the T cellrepertoire contains a large number of clones with a small clone size.

The results herein show that the realized set of TCRβ chains are samplednon-uniformly from the huge potential space of sequences. In particular,the β chains sequences closer to germ line (few insertions and deletionsat the V-D and D-J boundaries) appear to be created at a relatively highfrequency. TCR sequences close to germ line are shared between differentpeople because the germ line sequence for the V's, D's, and J's areshared, modulo a small number of polymorphisms, among the humanpopulation.

The T cell receptors expressed by mature αβ T cells are heterodimerswhose two constituent chains are generated by independent rearrangementevents of the TCR α and β chain variable loci. The α chain has lessdiversity than the β chain, so a higher fraction of α's are sharedbetween individuals, and hundreds of exact TCR αβ receptors are sharedbetween any pair of individuals.

Cells

B cells and T cells can be obtained from a variety of tissue samplesincluding marrow, thymus, lymph glands, peripheral tissues and blood,but peripheral blood is most easily accessed. Peripheral blood samplesare obtained by phlebotomy from subjects. Peripheral blood mononuclearcells (PBMC) are isolated by techniques known to those of skill in theart, e.g., by Ficoll-Hypaque® density gradient separation. Preferably,whole PBMCs are used for analysis. The B and/or T lymphocytes, instead,may be flow sorted into multiple compartments for each subject: e.g.CD8⁺CD45RO^(+/−) and CD4⁺CD45RO^(+/−) using fluorescently labeledanti-human antibodies, e.g, CD4 FITC (clone M-T466, Miltenyi Biotec),CD8 PE (clone RPA-T8, BD Biosciences), CD45RO ECD (clone UCHL-1, BeckmanCoulter), and CD45RO APC (clone UCHL-1, BD Biosciences). Staining oftotal PBMCs may be done with the appropriate combination of antibodies,followed by washing cells before analysis. Lymphocyte subsets can beisolated by FACS sorting, e.g., by a BD FACSAria™ cell-sorting system(BD Biosciences) and by analyzing results with FlowJo software (TreestarInc.), and also by conceptually similar methods involving specificantibodies immobilized to surfaces or beads.

Nucleic Acid Extraction

Total genomic DNA is extracted from cells, e.g., by using the QIAamp®DNA blood Mini Kit (QIAGEN®). The approximate mass of a single haploidgenome is 3 pg. Preferably, at least 100,000 to 200,000 cells are usedfor analysis of diversity, i.e., about 0.6 to 1.2 μg DNA from diploid Tcells. Using PBMCs as a source, the number of T cells can be estimatedto be about 30% of total cells.

Alternatively, total nucleic acid can be isolated from cells, includingboth genomic DNA and mRNA. If diversity is to be measured from mRNA inthe nucleic acid extract, the mRNA must be converted to cDNA prior tomeasurement. This can readily be done by methods of one of ordinaryskill.

DNA Amplification

A multiplex PCR system is used to amplify rearranged TCR loci fromgenomic DNA, preferably from a CDR3 region, more preferably from a TCRα,TCRγ or TCRδ CDR3 region, most preferably from a TCRβ CDR3 region.

In general, a multiplex PCR system may use at least 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, or 25, preferably 26, 27, 28, 29, 30, 31, 32,33, 34, 35, 36, 37, 38, or 39, most preferably 40, 41, 42, 43, 44, or 45forward primers, in which each forward primer is specific to a sequencecorresponding to one or more TRB V region segments shown in SEQ IDNOS:114-248; and at least 3, 4, 5, 6, or 7, preferably 8, 9, 10, 11, 12or 13 reverse primers, in which each reverse primer is specific to asequence corresponding to one or more TRB J region segments shown in SEQID NOS:249-261. Most preferably, there is a J segment primer for every Jsegment.

Preferably, the primers are designed not to cross an intron/exonboundary. The forward primers must preferably anneal to the V segmentsin a region of relatively strong sequence conservation between Vsegments so as to maximize the conservation of sequence among theseprimers. Accordingly, this minimizes the potential for differentialannealing properties of each primer, and so that the amplified regionbetween V and J primers contains sufficient TCR V sequence informationto identify the specific V gene segment used.

Preferably, the J segment primers hybridize with a conserved element ofthe J segment, and have similar annealing strength. Most preferably, allJ segment primers anneal to the same conserved framework region motif.The forward and reverse primers are both preferably modified at the 5′end with the universal forward primer sequence compatible with a DNAsequencer.

For example, a multiplex PCR system may use 45 forward primers (Table1), each specific to a functional TCR Vβ segment, and thirteen reverseprimers (Table 2), each specific to a TCR Jβ segment. Xn and Yncorrespond to polynucleotides of lengths n and m, respectively, whichwould be specific to the single molecule sequencing technology beingused to read out the assay.

TABLE 1  TCR-Vβ Forward primer sequences SEQ TRBV gene ID segment(s) NO:Primer sequence* TRBV2 1 XnTCAAATTTCACTCTGAAGATCCGGTCCACAA TRBV3-1 2XnGCTCACTTAAATCTTCACATCAATTCCCTGG TRBV4-1 3 XnCTTAAACCTTCACCTACACGCCCTGCTRBV(4-2, 4 XnCTTATTCCTTCACCTACACACCCTGC 4-3) TRBV5-1 5XnGCTCTGAGATGAATGTGAGCACCTTG TRBV5-3 6 XnGCTCTGAGATGAATGTGAGTGCCTTGTRBV(5-4, 7 XnGCTCTGAGCTGAATGTGAACGCCTTG 5-5, 5-6, 5-7, 5-8) TRBV6-1 8XnTCGCTCAGGCTGGAGTCGGCTG TRBV(6-2, 9 XnGCTGGGGTTGGAGTCGGCTG 6-3) TRBV6-410 XnCCCTCACGTTGGCGTCTGCTG TRBV6-5 11 XnGCTCAGGCTGCTGTCGGCTG TRBV6-6 12XnCGCTCAGGCTGGAGTTGGCTG TRBV6-7 13 XnCCCCTCAAGCTGGAGTCAGCTG TRBV6-8 14XnCACTCAGGCTGGTGTCGGCTG TRBV6-9 15 XnCGCTCAGGCTGGAGTCAGCTG TRBV7-1 16XnCCACTCTGAAGTTCCAGCGCACAC TRBV7-2 17 XnCACTCTGACGATCCAGCGCACAC TRBV7-318 XnCTCTACTCTGAAGATCCAGCGCACAG TRBV7-4 19 XnCCACTCTGAAGATCCAGCGCACAGTRBV7-6 20 XnCACTCTGACGATCCAGCGCACAG TRBV7-7 21XnCCACTCTGACGATTCAGCGCACAG TRBV7-8 22 XnCCACTCTGAAGATCCAGCGCACAC TRBV7-923 XnCACCTTGGAGATCCAGCGCACAG TRBV9 24 XnGCACTCTGAACTAAACCTGAGCTCTCTGTRBV10-1 25 XnCCCCTCACTCTGGAGTCTGCTG TRBV10-2 26XnCCCCCTCACTCTGGAGTCAGCTA TRBV10-3 27 XnCCTCCTCACTCTGGAGTCCGCTATRBV(11-1, 28 XnCCACTCTCAAGATCCAGCCTGCAG 11-3) TRBV11-2 29XnCTCCACTCTCAAGATCCAGCCTGCAA TRBV(12-3, 30 XnCCACTCTGAAGATCCAGCCCTCAG12-4, 12-5) TRBV13 31 XnCATTCTGAACTGAACATGAGCTCCTTGG TRBV14 32XnCTACTCTGAAGGTGCAGCCTGCAG TRBV15 33 XnGATAACTTCCAATCCAGGAGGCCGAACATRBV16 34 XnCTGTAGCCTTGAGATCCAGGCTACGA TRBV17 35XnCTTCCACGCTGAAGATCCATCCCG TRBV18 36 XnGCATCCTGAGGATCCAGCAGGTAG TRBV1937 XnCCTCTCACTGTGACATCGGCCC TRBV20-1 38 XnCTTGTCCACTCTGACAGTGACCAGTGTRBV23-1 39 XnCAGCCTGGCAATCCTGTCCTCAG TRBV24-1 40XnCTCCCTGTCCCTAGAGTCTGCCAT TRBV25-1 41 XnCCCTGACCCTGGAGTCTGCCA TRBV27 42XnCCCTGATCCTGGAGTCGCCCA TRBV28 43 XnCTCCCTGATTCTGGAGTCCGCCA TRBV29-1 44XnCTAACATTCTCAACTCTGACTGTGAGCAACA TRBV30 45XnCGGCAGTTCATCCTGAGTTCTAAGAAGC

TABLE 2  TCR-Jβ Reverse Primer Sequences TRBJ SEQ gene ID segment NO:Primer sequence* TRBJ1-1 46 YmTTACCTACAACTGTGAGTCTGGTGCCTTGTCCAAATRBJ1-2 47 YmACCTACAACGGTTAACCTGGTCCCCGAACCGAA TRBJ1-3 48YmACCTACAACAGTGAGCCAACTTCCCTCTCCAAA TRBJ1-4 49YmCCAAGACAGAGAGCTGGGTTCCACTGCCAAA TRBJ1-5 483YmACCTAGGATGGAGAGTCGAGTCCCATCACCAAA TRBJ1-6 50YmCTGTCACAGTGAGCCTGGTCCCGTTCCCAAA TRBJ2-1 51 YmCGGTGAGCCGTGTCCCTGGCCCGAATRBJ2-2 52 YmCCAGTACGGTCAGCCTAGAGCCTTCTCCAAA TRBJ2-3 53YmACTGTCAGCCGGGTGCCTGGGCCAAA TRBJ2-4 54 YmAGAGCCGGGTCCCGGCGCCGAA TRBJ2-555 YmGGAGCCGCGTGCCTGGCCCGAA TRBJ2-6 56 YmGTCAGCCTGCTGCCGGCCCCGAA TRBJ2-757 YmGTGAGCCTGGTGCCCGGCCCGAA

The 45 forward PCR primers of Table 1 are complementary to each of the48 functional Variable segments, and the thirteen reverse PCR primers ofTable 2 are complementary to each of the functional joining (J) genesegments from the TRB locus (TRBJ). The TRB V region segments areidentified in the Sequence Listing at SEQ ID NOS:114-248 and the TRB Jregion segments are at SEQ ID NOS:249-261. The primers have beendesigned such that adequate information is present within the amplifiedsequence to identify both the V and J genes uniquely (>40 base pairs ofsequence upstream of the V gene recombination signal sequence (RSS),and >30 base pairs downstream of the J gene RSS). Alternative primersmay be selected by one of ordinary skill from the V and J regions of thegenes of each TCR subunit.

The forward primers are modified at the 5′ end with the universalforward primer sequence compatible with the DNA sequencer (Xn of Table1). Similarly, all of the reverse primers are modified with a universalreverse primer sequence (Ym of Table 2). One example of such universalprimers is shown in Tables 3 and 4, for the Illumina GAII single-endread sequencing system. The 45 TCR Vβ forward primers anneal to the Vβsegments in a region of relatively strong sequence conservation betweenVβ segments so as to maximize the conservation of sequence among theseprimers.

TABLE 3  TCR-Vβ Forward primer sequences SEQ TRBV gene ID segment(s) NO:Primer sequence* TRBV2 58CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTTCAAATTTCACTCTGAAGATCCGGTCCACAATRBV3-1 59CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTCACTTAAATCTTCACATCAATTCCCTGGTRBV4-1 60 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTTAAACCTTCACCTACACGCCCTGCTRBV(4-2, 4-3) 61CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTTATTCCTTCACCTACACACCCTGC TRBV5-1 62CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTCTGAGATGAATGTGAGCACCTTG TRBV5-3 63CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTCTGAGATGAATGTGAGTGCCTTGTRBV(5-4, 5-5, 64CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTCTGAGCTGAATGTGAACGCCTTG5-6, 5-7, 5-8) TRBV6-1 65CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTTCGCTCAGGCTGGAGTCGGCTG TRBV(6-2, 6-3)66 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTGGGGTTGGAGTCGGCTG TRBV6-4 67CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCTCACGTTGGCGTCTGCTG TRBV6-5 68CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCTCAGGCTGCTGTCGGCTG TRBV6-6 69CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCGCTCAGGCTGGAGTTGGCTG TRBV6-7 70CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCCTCAAGCTGGAGTCAGCTG TRBV6-8 71CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCACTCAGGCTGGTGTCGGCTG TRBV6-9 72CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCGCTCAGGCTGGAGTCAGCTG TRBV7-1 73CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTGAAGTTCCAGCGCACAC TRBV7-2 74CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCACTCTGACGATCCAGCGCACAC TRBV7-3 75CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTCTACTCTGAAGATCCAGCGCACAG TRBV7-4 76CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTGAAGATCCAGCGCACAG TRBV7-6 77CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCACTCTGACGATCCAGCGCACAG TRBV7-7 78CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTGACGATTCAGCGCACAG TRBV7-8 79CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTGAAGATCCAGCGCACAC TRBV7-9 80CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCACCTTGGAGATCCAGCGCACAG TRBV9 81CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCACTCTGAACTAAACCTGAGCTCTCTG TRBV10-182 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCCTCACTCTGGAGTCTGCTG TRBV10-2 83CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCCCTCACTCTGGAGTCAGCTA TRBV10-3 84CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCTCCTCACTCTGGAGTCCGCTA TRBV(11-1, 85CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTCAAGATCCAGCCTGCAG 11-3)TRBV11-2 86 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTCCACTCTCAAGATCCAGCCTGCAATRBV(12-3, 87 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCACTCTGAAGATCCAGCCCTCAG12-4, 12-5) TRBV13 88CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCATTCTGAACTGAACATGAGCTCCTTGG TRBV14 89CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTACTCTGAAGGTGCAGCCTGCAG TRBV15 90CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGATAACTTCCAATCCAGGAGGCCGAACA TRBV16 91CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTGTAGCCTTGAGATCCAGGCTACGA TRBV17 92CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTTCCACGCTGAAGATCCATCCCG TRBV18 93CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTGCATCCTGAGGATCCAGCAGGTAG TRBV19 94CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCTCTCACTGTGACATCGGCCC TRBV20-1 95CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTTGTCCACTCTGACAGTGACCAGTG TRBV23-1 96CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCAGCCTGGCAATCCTGTCCTCAG TRBV24-1 97CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTCCCTGTCCCTAGAGTCTGCCAT TRBV25-1 98CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCTGACCCTGGAGTCTGCCA TRBV27 99CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCCCTGATCCTGGAGTCGCCCA TRBV28 100CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTCCCTGATTCTGGAGTCCGCCA TRBV29-1 101CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCTAACATTCTCAACTCTGACTGTGAGCAACA TRBV30102 CAAGCAGAAGACGGCATACGAGCTCTTCCGATCTCGGCAGTTCATCCTGAGTTCTAAGAAGC

TABLE 4  TCR-Jβ Reverse Primer Sequences TRBJ gene SEQ ID segment NO:Primer sequence* TRBJ1-1 103AATGATACGGCGACCACCGAGATCTTTACCTACAACTGTGAGTCTGGTGCCTTGTCCAAA TRBJ1-2 468AATGATACGGCGACCACCGAGATCTACCTACAACGGTTAACCTGGTCCCCGAACCGAA TRBJ1-3 104AATGATACGGCGACCACCGAGATCTACCTACAACAGTGAGCCAACTTCCCTCTCCAAA TRBJ1-4 105AATGATACGGCGACCACCGAGATCTCCAAGACAGAGAGCTGGGTTCCACTGCCAAA TRBJ1-5 484AATGATACGGCGACCACCGAGATCTACCTAGGATGGAGAGTCGAGTCCCATCACCAAA TRBJ1-6 106AATGATACGGCGACCACCGAGATCTCTGTCACAGTGAGCCTGGTCCCGTTCCCAAA TRBJ2-1 107AATGATACGGCGACCACCGAGATCTCGGTGAGCCGTGTCCCTGGCCCGAA TRBJ2-2 108AATGATACGGCGACCACCGAGATCTCCAGTACGGTCAGCCTAGAGCCTTCTCCAAA TRBJ2-3 109AATGATACGGCGACCACCGAGATCTACTGTCAGCCGGGTGCCTGGGCCAAA TRBJ2-4 110AATGATACGGCGACCACCGAGATCTAGAGCCGGGTCCCGGCGCCGAA TRBJ2-5 111AATGATACGGCGACCACCGAGATCTGGAGCCGCGTGCCTGGCCCGAA TRBJ2-6 112AATGATACGGCGACCACCGAGATCTGTCAGCCTGCTGCCGGCCCCGAA TRBJ2-7 113AATGATACGGCGACCACCGAGATCTGTGAGCCTGGTGCCCGGCCCGAA *bold sequenceindicates universal R oligonucleotide for the sequence analysis

The total PCR product for a rearranged TCRβ CDR3 region using thissystem is expected to be approximately 200 bp long. Genomic templatesare PCR amplified using a pool of the 45 TCR Vβ F primers (the “VFpool”) and a pool of the twelve TCR Jβ R primers (the “JR pool”). Forexample, 50 μl PCR reactions may be used with 1.0 μM VF pool (22 nM foreach unique TCR Vβ F primer), 1.0 μM JR pool (77 nM for each uniqueTCRBJR primer), 1× QIAGEN Multiple PCR master mix (QIAGEN part number206145), 10% Q-solution (QIAGEN), and 16 ng/ul gDNA.

The IGH primer set was designed to try to accommodate the potential forsomatic hypermutation within the rearranged IGH genes, as is observedafter initial stimulation of naïve B cells. Consequently all primerswere designed to be slightly longer than normal, and to anchor the 3′ends of each primer into highly conserved sequences of three or morenucleotides that should be resistant to both functional andnon-functional somatic mutations.

The IGHJ reverse primers were designed to anchor the 3′ end of each PCRprimer on a highly conserved GGGG sequence motif within the IGHJsegments. These sequences are shown in Table 5. Underlined sequence areten base pairs in from RSS that may be deleted. These were excluded frombarcode design. Bold sequence is the reverse complement of the IGH Jreverse PCR primers. Italicized sequence is the barcode for J identity(eight barcodes reveal six genes, and two alleles within genes). Furthersequence within underlined segment may reveal additional allelicidentities.

TABLE 5 SEQ ID IgH J segment NO: Sequence >IGHJ4*01/1-48 452               ACTACTTTGACTACTGGGGCCAAGGAACCCTGGTCACCGTCTCCTCAG >IGHJ4*03/1-48 453               GCTACTTTGACTACTGGGGCCAAGGGACCCTGGTCACCGTCTCCTCAG >IGHJ4*02/1-48 454               ACTACTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG >IGHJ3*01/1-50 455             TGATGCTTTTGATGTCTGGGGCCAAGGGACAATGGTCACCGTCTCTTCAG >IGHJ3*02/1-50456             TGATGCTTTTGATATCTGGGGCCAAGGGACAATGGTCACCGTCTCTTCAG >IGHJ6*01/1-63457ATTACTACTACTACTACGGTATGGACGTCTGGGGGCAAGGGACCACGGTCACCGTCTCCTCAG >IGHJ6*02/1-62458ATTACTACTACTACTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCAG >IGHJ6*04/1-63459ATTACTACTACTACTACGGTATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAG >IGHJ6*03/1-62460ATTACTACTACTACTACTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAG >IGHJ2*01/1-53461          CTACTGGTACTTCGATCTCTGGGGCCGTGGCACCCTGGTCACTGTCTCCTCAG >IGHJ5*01/1-51462            ACAACTGGTTCGACTCCTGGGGCCAAGGAACCCTGGTCACCGTCTCCTCAG >IGHJ5*02/1-51463            ACAACTGGTTCGACCCCTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG >IGHJ1*01/1-52464           GCTGAATACTTCCAGCACTGGGGCCAGGGCACCCTGGTCACCGTCTCCTCAG >IGHJ2P*01/1-61465  CTACAAGTGCTTGGAGCACTGGGGCAGGGCAGCCCGGACACCGTCTCCCTGGGAACGTCAG >IGHJ1P*01/1-54466         AAAGGTGCTGGGGGTCCCCTGAACCCGACCCGCCCTGAGACCGCAGCCACATCA >IGHJ3P*01/1-52467            CTTGCGGTTGGACTTCCCAGCCGACAGTGGTGGTCTGGCTTCTGAGGGGTCA

Sequences of the IGHJ reverse PCR primers are shown in Table 6.

TABLE 6  IgH J SEQ segment ID NO: sequence >IGHJ4_1 421TGAGGAGACGGTGACCAGGGTTCCTTGGCCC >IGHJ4_3 422TGAGGAGACGGTGACCAGGGTCCCTTGGCCC >IGHJ4_2 423TGAGGAGACGGTGACCAGGGTTCCCTGGCCC >IGHJ3_12 424CTGAAGAGACGGTGACCATTGTCCCTTGGCCC >IGHJ6_1 425CTGAGGAGACGGTGACCGTGGTCCCTTGCCCC >IGHJ6_2 426TGAGGAGACGGTGACCGTGGTCCCTTGGCCC >IGHJ6_34 427CTGAGGAGACGGTGACCGTGGTCCCTTTGCCC >IGHJ2_1 428CTGAGGAGACAGTGACCAGGGTGCCACGGCCC >IGHJ5_1 429CTGAGGAGACGGTGACCAGGGTTCCTTGGCCC >IGHJ5_2 430CTGAGGAGACGGTGACCAGGGTTCCCTGGCCC >IGHJ1_1 431CTGAGGAGACGGTGACCAGGGTGCCCTGGCCC

V primers were designed in a conserved in region of FR2 between the twoconserved tryptophan (W) codons.

The primer sequences are anchored at the 3′ end on a tryptophan codonfor all IGHV families that conserve this codon. his allows for the lastthree nucleotides (tryptophan's TGG) to anchor on sequence that isexpected to be resistant to somatic hypermutation, providing a 3′ anchorof five out of six nucleotides for each primer. The upstream sequence isextended further than normal, and includes degenerate nucleotides toallow for mismatches induced by hypermutation (or between closely relateIGH V families) without dramatically changing the annealingcharacteristics of the primer, as shown in Table 7. The sequences of theV gene segments are SEQ ID NOS:262-420.

TABLE 7  IgH V SEQ segment ID NO: sequence >IGHV1 443TGGGTGCACCAGGTCCANGNACAAGGGCT TGAGTGG >IGHV2 444TGGGTGCGACAGGCTCGNGNACAACGCCT TGAGTGG >IGHV3 445TGGGTGCGCCAGATGCCNGNGAAAGGCCT GGAGTGG >IGHV4 446TGGGTCCGCCAGSCYCCNGNGAAGGGGCT GGAGTGG >IGHV5 447TGGGTCCGCCAGGCTCCNGNAAAGGGGCT GGAGTGG >IGHV6 448TGGGTCTGCCAGGCTCCNGNGAAGGGGCA GGAGTGG >IGH7_3.25p 449TGTGTCCGCCAGGCTCCAGGGAATGGGCT GGAGTTGG >IGH8_3.54p 450TCAGATTCCCAAGCTCCAGGGAAGGGGCT GGAGTGAG >IGH9_3.63p 451TGGGTCAATGAGACTCTAGGGAAGGGGCT GGAGGGAG

Thermal cycling conditions may follow methods of those skilled in theart. For example, using a PCR Express thermal cycler (Hybaid, Ashford,UK), the following cycling conditions may be used: 1 cycle at 95° C. for15 minutes, 25 to 40 cycles at 94° C. for 30 seconds, 59° C. for 30seconds and 72° C. for 1 minute, followed by one cycle at 72° C. for 10minutes.

Sequencing

Sequencing is achieved using a set of sequencing oligonucleotides thathybridize to a defined region within the amplified DNA molecules.

Preferably, the amplified J gene segments each have a unique four basetag at positions +11 through +14 downstream from the RSS site.Accordingly, the sequencing oligonucleotides hybridize adjacent to afour base tag within the amplified Jβ gene segments at positions +11through +14 downstream of the RSS site.

For example, sequencing oligonucleotides for TCRB may be designed toanneal to a consensus nucleotide motif observed just downstream of this“tag”, so that the first four bases of a sequence read will uniquelyidentify the J segment (Table 8).

TABLE 8  Sequencing oligonucleotides Sequencing oligonucleotideSEQ ID NO: Oligonucleotide sequence Jseq 1-1 470ACAACTGTGAGTCTGGTGCCTTGTCCAAAGAAA Jseq 1-2 471ACAACGGTTAACCTGGTCCCCGAACCGAAGGTG Jseq 1-3 472ACAACAGTGAGCCAACTTCCCTCTCCAAAATAT Jseq 1-4 473AAGACAGAGAGCTGGGTTCCACTGCCAAAAAAC Jseq 1-5 474AGGATGGAGAGTCGAGTCCCATCACCAAAATGC Jseq 1-6 475GTCACAGTGAGCCTGGTCCCGTTCCCAAAGTGG Jseq 2-1 476AGCACGGTGAGCCGTGTCCCTGGCCCGAAGAAC Jseq 2-2 477AGTACGGTCAGCCTAGAGCCTTCTCCAAAAAAC Jseq 2-3 478AGCACTGTCAGCCGGGTGCCTGGGCCAAAATAC Jseq 2-4 479AGCACTGAGAGCCGGGTCCCGGCGCCGAAGTAC Jseq 2-5 480AGCACCAGGAGCCGCGTGCCTGGCCCGAAGTAC Jseq 2-6 481AGCACGGTCAGCCTGCTGCCGGCCCCGAAAGTC Jseq 2-7 482GTGACCGTGAGCCTGGTGCCCGGCCCGAAGTAC

The information used to assign the J and V segment of a sequence read isentirely contained within the amplified sequence, and does not rely uponthe identity of the PCR primers. These sequencing oligonucleotides wereselected such that promiscuous priming of a sequencing reaction for oneJ segment by an oligonucleotide specific to another J segment wouldgenerate sequence data starting at exactly the same nucleotide assequence data from the correct sequencing oligonucleotide. In this way,promiscuous annealing of the sequencing oligonucleotides did not impactthe quality of the sequence data generated.

The average length of the CDR3 region, defined as the nucleotidesbetween the second conserved cysteine of the V segment and the conservedphenylalanine of the J segment, is 35+/−3, so sequences starting fromthe Jβ segment tag will nearly always capture the complete V-D-Jjunction in a 50 base pair read.

TCR βJ gene segments are roughly 50 base pair in length. PCR primersthat anneal and extend to mismatched sequences are referred to aspromiscuous primers. The TCR Jβ Reverse PCR primers were designed tominimize overlap with the sequencing oligonucleotides to minimizepromiscuous priming in the context of multiplex PCR. The 13 TCR Jβreverse primers are anchored at the 3′ end on the consensus splice sitemotif, with minimal overlap of the sequencing primers. The TCR Jβprimers provide consistent annealing temperature using the sequencerprogram under default parameters.

For the sequencing reaction, the IGHJ sequencing primers extend threenucleotides across the conserved CAG sequences as shown in Table 9.

TABLE 9  IgH J segment SEQ ID NO: sequence >IGHJSEQ4_1 432TGAGGAGACGGTGACCAGGGTTCCTTGGCCCCAG >IGHJSEQ4_3 433TGAGGAGACGGTGACCAGGGTCCCTTGGCCCCAG >IGHJSEQ4_2 434TGAGGAGACGGTGACCAGGGTTCCCTGGCCCCAG >IGHJSEQ3_12 435CTGAAGAGACGGTGACCATTGTCCCTTGGCCCCAG >IGHJSEQ6_1 436CTGAGGAGACGGTGACCGTGGTCCCTTGCCCCCAG >IGHJSEQ6_2 437TGAGGAGACGGTGACCGTGGTCCCTTGGCCCCAG >IGHJSEQ6_34 438CTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAG >IGHJSEQ2_1 439CTGAGGAGACAGTGACCAGGGTGCCACGGCCCCAG >IGHJSEQ5_1 440CTGAGGAGACGGTGACCAGGGTTCCTTGGCCCCAG >IGHJSEQ5_2 441CTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCAG >IGHJSEQ1_1 442CTGAGGAGACGGTGACCAGGGTGCCCTGGCCCCAG

Processing Sequence Data

For rapid analysis of sequencing results, an algorithm can be developedby one of ordinary skill. A preferred method is as follows.

The use of a PCR step to amplify the TCRβ CDR3 regions prior tosequencing could potentially introduce a systematic bias in the inferredrelative abundance of the sequences, due to differences in theefficiency of PCR amplification of CDR3 regions utilizing different Vβand Jβ gene segments. Each cycle of PCR amplification potentiallyintroduces a bias of average magnitude 1.5^(1/15)=1.027. Thus, the 25cycles of PCR introduces a total bias of average magnitude 1.027²⁵=1.95in the inferred relative abundance of distinct CDR3 region sequences.

Sequenced reads were filtered for those including CDR3 sequences.Sequencer data processing involves a series of steps to remove errors inthe primary sequence of each read, and to compress the data. Acomplexity filter removes approximately 20% of the sequences that aremisreads from the sequencer. Then, sequences were required to have aminimum of a six base match to both one of the thirteen TCRB J-regionsand one of 54 V-regions. Applying the filter to the control lanecontaining phage sequence, on average only one sequence in 7-8 millionpassed these steps. Finally, a nearest neighbor algorithm was used tocollapse the data into unique sequences by merging closely relatedsequences, in order to remove both PCR error and sequencing error.

Analyzing the data, the ratio of sequences in the PCR product must bederived working backward from the sequence data before estimating thetrue distribution of clonotypes in the blood. For each sequence observeda given number of times in the data herein, the probability that thatsequence was sampled from a particular size PCR pool is estimated.Because the CDR3 regions sequenced are sampled randomly from a massivepool of PCR products, the number of observations for each sequence aredrawn from Poisson distributions. The Poisson parameters are quantizedaccording to the number of T cell genomes that provided the template forPCR. A simple Poisson mixture model both estimates these parameters andplaces a pairwise probability for each sequence being drawn from eachdistribution. This is an expectation maximization method whichreconstructs the abundances of each sequence that was drawn from theblood.

To estimate diversity, the “unseen species” formula is employed. Toapply this formula, unique adaptive immune receptors (e.g. TCRB)clonotypes takes the place of species. The mathematical solutionprovides that for a total number of TCRβ “species” or clonotypes, S, asequencing experiment observes x_(s) copies of sequence s. For all ofthe unobserved clonotypes, x_(s) equals 0, and each TCR clonotype is“captured” in a blood draw according to a Poisson process with parameterλ_(s). The number of T cell genomes sequenced in the first measurement1, and in the second measurement. Since there are a large number ofunique sequences, an integral will represent the sum. If G(λ) is theempirical distribution function of the parameters λ₁, . . . , λ_(s), andn_(x) is the number of clonotypes sequenced exactly x times, then thetotal number of clonotypes, i.e., the measurement of diversity E, isgiven by the following formula:

${E\left( n_{x} \right)} = {S{\int_{0}^{\infty}{\left( \frac{^{- \lambda}\lambda^{x}}{x!} \right)\ {{{G(\lambda)}}.}}}}$

For a given experiment, where T cells are sampled from some arbitrarysource (e.g. a blood draw), the formula is used to estimate the totaldiversity of species in the entire source. The idea is that the samplednumber of clonotypes at each size contains sufficient information toestimate the underlying distribution of clonotypes in the whole source.To derive the formula, the number of new species expected if the exactmeasurement was repeated was estimated. The limit of the formula as ifrepeating the measurements an infinite number of times. The result isthe expect number of species in the total underlying source population.The value for Δ(t), the number of new clonotypes observed in a secondmeasurement, should be determined, preferably using the followingequation:

${\Delta \; (t)} = {{{\sum\limits_{x}{E\left( n_{x} \right)}_{{{msmt}\; 1} + {{msmt}\; 2}}} - {\sum\limits_{x}{E\left( n_{x} \right)}_{{msmt}\; 1}}} = {S{\int_{0}^{\infty}{{^{- \lambda}\left( {1 - ^{{- \lambda}\; t}} \right)}\ {{G(\lambda)}}}}}}$

in which msmt1 and msmt2 are the number of clonotypes from measurement 1and 2, respectively. Taylor expansion of 1-e^(−λt) givesΔ(t)=E(x₁)t−E(x₂)t²+E(x₃)t³− . . . , which can be approximated byreplacing the expectations E(n_(x)) with the observed numbers in thefirst measurement. Using in the numbers observed in the firstmeasurement, this formula predicts that 1.6*10⁵ new unique sequencesshould be observed in the second measurement. The actual value of thesecond measurement was 1.8*10⁵ new TCRβ sequences, which implies thatthe prediction provided a valid lower bound on total diversity. AnEuler's transformation was used to regularize Δ(t) to produce a lowerbound for Δ(∞).

Using a Measurement of Diversity to Diagnose Disease

The measurement of diversity can be used to diagnose disease or theeffects of a treatment, as follows. T cell and/or B cell receptorrepertoires can be measured at various time points, e.g., afterhematopoietic stem cell transplant (HSCT) treatment for leukemia. Boththe change in diversity and the overall diversity of TCRB repertoire canbe utilized to measure immunocompetence. A standard for the expectedrate of immune reconstitution after transplant can be utilized. The rateof change in diversity between any two time points may be used toactively modify treatment. The overall diversity at a fixed time pointis also an important measure, as this standard can be used to comparebetween different patients. In particular, the overall diversity is themeasure that should correlate with the clinical definition of immunereconstitution. This information may be used to modify prophylactic drugregiments of antibiotics, antivirals, and antifungals, e.g., after HSCT.

The assessment of immune reconstitution after allogeneic hematopoieticcell transplantation can be determined by measuring changes indiversity. These techniques will also enhance the analysis of howlymphocyte diversity declines with age, as measured by analysis of Tcell responses to vaccination. Further, the methods of the inventionprovide a means to evaluate investigational therapeutic agents (e.g.,Interleukin-7 (IL-7)) that have a direct effect on the generation,growth, and development of αβ T cells. Moreover, application of thesetechniques to the study of thymic T cell populations will provideinsight into the processes of both T cell receptor gene rearrangement aswell as positive and negative selection of thymocytes.

A newborn that does not yet have a fully functioning immune system butmay have maternally transmitted antibody is immunodeficient. A newbornis susceptible to a number of diseases until its immune systemautonomously develops, and our measurement of the adaptive immune systemmay will likely prove useful with newborn patients.

Lymphocyte diversity can be assessed in other states of congenital oracquired immunodeficiency. An AIDS patient with a failed or failingimmune system can be monitored to determine the stage of disease, and tomeasure a patient's response to therapies aimed to reconstituteimmunocompetence.

Another application of the methods of the invention is to providediagnostic measures for solid organ transplant recipients takingmedication so their body will not reject the donated organ. Generally,these patients are under immunosuppressive therapies. Monitoring theimmunocompetence of the host will assist before and aftertransplantation.

Individuals exposed to radiation or chemotherapeutic drugs are subjectto bone marrow transplantations or otherwise require replenishment of Tcell populations, along with associated immunocompetence. The methods ofthe invention provide a means for qualitatively and quantitativelyassessing the bone marrow graft, or reconstitution of lymphocytes in thecourse of these treatments.

One manner of determining diversity is by comparing at least two samplesof genomic DNA, preferably in which one sample of genomic DNA is from apatient and the other sample is from a normal subject, or alternatively,in which one sample of genomic DNA is from a patient before atherapeutic treatment and the other sample is from the patient aftertreatment, or in which the two samples of genomic DNA are from the samepatient at different times during treatment. Another manner of diagnosismay be based on the comparison of diversity among the samples of genomicDNA, e.g., in which the immunocompetence of a human patient is assessedby the comparison.

Biomarkers

Shared TCR sequences between individuals represent a new class ofpotential biomarkers for a variety of diseases, including cancers,autoimmune diseases, and infectious diseases. These are the public Tcells that have been reported for multiple human diseases. TCRs areuseful as biomarkers because T cells are a result of clonal expansion,by which the immune system amplifies these biomarkers through rapid celldivision. Following amplification, the TCRs are readily detected even ifthe target is small (e.g. an early stage tumor). TCRs are also useful asbiomarkers because in many cases the T cells might additionallycontribute to the disease causally and, therefore could constitute adrug target. T cells self interactions are thought to play a major rolein several diseases associated with autoimmunity, e.g., multiplesclerosis, Type I diabetes, and rheumatoid arthritis.

EXAMPLES Example 1 Sample Acquisition, PBMC Isolation, FACS Sorting andGenomic DNA Extraction

Peripheral blood samples from two healthy male donors aged 35 and 37were obtained with written informed consent using forms approved by theInstitutional Review Board of the Fred Hutchinson Cancer Research Center(FHCRC). Peripheral blood mononuclear cells (PBMC) were isolated byFicoll-Hypaque® density gradient separation. The T-lymphocytes were flowsorted into four compartments for each subject: CD8⁺CD45RO^(+/−) andCD4⁺CD45RO^(+/−). For the characterization of lymphocytes the followingconjugated anti-human antibodies were used: CD4 FITC (clone M-T466,Miltenyi Biotec), CD8 PE (clone RPA-T8, BD Biosciences), CD45RO ECD(clone UCHL-1, Beckman Coulter), and CD45RO APC (clone UCHL-1, BDBiosciences). Staining of total PBMCs was done with the appropriatecombination of antibodies for 20 minutes at 4° C., and stained cellswere washed once before analysis. Lymphocyte subsets were isolated byFACS sorting in the BD FACSAria™ cell-sorting system (BD Biosciences).Data were analyzed with FlowJo software (Treestar Inc.).

Total genomic DNA was extracted from sorted cells using the QIAamp® DNAblood Mini Kit (QIAGEN®). The approximate mass of a single haploidgenome is 3 pg. In order to sample millions of rearranged TCRB in each Tcell compartment, 6 to 27 micrograms of template DNA were obtained fromeach compartment (see Table 10).

TABLE 10 CD8+/ CD8+/ CD4+/ CD4+/ CD45RO− CD45RO+ CD45RO− CD45RO+ Donorcells 9.9 6.3 6.3 10 2 (×10⁶) DNA (μg) 27 13 19 25 PCR 25 25 30 30cycles clusters 29.3 27 102.3* 118.3* (K/tile) VJ 3.0 2.0 4.4 4.2sequences (×10⁶) Cells 4.9 4.8 3.3 9 1 DNA 12 13 6.6 19 PCR 30 30 30 30cycles Clusters 116.3 121 119.5 124.6 VJ 3.2 3.7 4.0 3.8 sequences CellsNA NA NA 0.03 PCR Bias DNA NA NA NA 0.015 assessment PCR NA NA NA 25 +15 cycles clusters NA NA NA 1.4/23.8 VJ NA NA NA 1.6 sequences

Example 2 Virtual T Cell Receptor β Chain Spectratyping

Virtual TCR β chain spectratyping was performed as follows.Complementary DNA was synthesized from RNA extracted from sorted T cellpopulations and used as template for multiplex PCR amplification of therearranged TCR β chain CDR3 region. Each multiplex reaction contained a6-FAM-labeled antisense primer specific for the TCR β chain constantregion, and two to five TCR β chain variable (TRBV) gene-specific senseprimers. All 23 functional Vβ families were studied. PCR reactions werecarried out on a Hybaid PCR Express thermal cycler (Hybaid, Ashford, UK)under the following cycling conditions: 1 cycle at 95° C. for 6 minutes,40 cycles at 94° C. for 30 seconds, 58° C. for 30 seconds, and 72° C.for 40 seconds, followed by 1 cycle at 72° C. for 10 minutes. Eachreaction contained cDNA template, 500 μM dNTPs, 2 mM MgCl₂ and 1 unit ofAmpliTaq Gold DNA polymerase (Perkin Elmer) in AmpliTaq Gold buffer, ina final volume of 20 μl. After completion, an aliquot of the PCR productwas diluted 1:50 and analyzed using a DNA analyzer. The output of theDNA analyzer was converted to a distribution of fluorescence intensityvs. length by comparison with the fluorescence intensity trace of areference sample containing known size standards.

Example 3 Multiplex PCR Amplification of TCRβ CDR3 Regions

The CDR3 junction region was defined operationally, as follows. Thejunction begins with the second conserved cysteine of the V-region andends with the conserved phenylalanine of the J-region. Taking thereverse complements of the observed sequences and translating theflanking regions, the amino acids defining the junction boundaries wereidentified. The number of nucleotides between these boundariesdetermines the length and therefore the frame of the CDR3 region. Inorder to generate the template library for sequencing, a multiplex PCRsystem was selected to amplify rearranged TCRβ loci from genomic DNA.The multiplex PCR system uses 45 forward primers (Table 3), eachspecific to a functional TCR Vβ segment, and thirteen reverse primers(Table 4), each specific to a TCR JP segment. The primers were selectedto provide that adequate information is present within the amplifiedsequence to identify both the V and J genes uniquely (>40 base pairs ofsequence upstream of the V gene recombination signal sequence (RSS),and >30 base pairs downstream of the J gene RSS).

The forward primers are modified at the 5′ end with the universalforward primer sequence compatible with the Illumina GA2 cluster stationsolid-phase PCR. Similarly, all of the reverse primers are modified withthe GA2 universal reverse primer sequence. The 3′ end of each forwardprimer is anchored at position −43 in the Vβ segment, relative to therecombination signal sequence (RSS), thereby providing a unique Vβ tagsequence within the amplified region. The thirteen reverse primersspecific to each Jβ segment are anchored in the 3′ intron, with the 3′end of each primer crossing the intron/exon junction. Thirteensequencing primers complementary to the Jβ segments were designed thatare complementary to the amplified portion of the Jβ segment, such thatthe first few bases of sequence generated will capture the unique Jβ tagsequence.

On average J deletions were 4 bp+/−2.5 bp, which implies that Jdeletions greater than 10 nucleotides occur in less than 1% ofsequences. The thirteen different TCR Jβ gene segments each had a uniquefour base tag at positions +11 through +14 downstream of the RSS site.Thus, sequencing oligonucleotides were designed to anneal to a consensusnucleotide motif observed just downstream of this “tag”, so that thefirst four bases of a sequence read will uniquely identify the J segment(Table 5).

The information used to assign the J and V segment of a sequence read isentirely contained within the amplified sequence, and does not rely uponthe identity of the PCR primers. These sequencing oligonucleotides wereselected such that promiscuous priming of a sequencing reaction for oneJ segment by an oligonucleotide specific to another J segment wouldgenerate sequence data starting at exactly the same nucleotide assequence data from the correct sequencing oligonucleotide. In this way,promiscuous annealing of the sequencing oligonucleotides did not impactthe quality of the sequence data generated.

The average length of the CDR3 region, defined following convention asthe nucleotides between the second conserved cysteine of the V segmentand the conserved phenylalanine of the J segment, is 35+/−3, sosequences starting from the Jβ segment tag will nearly always capturethe complete VNDNJ junction in a 50 bp read.

TCR βJ gene segments are roughly 50 bp in length. PCR primers thatanneal and extend to mismatched sequences are referred to as promiscuousprimers. Because of the risk of promiscuous priming in the context ofmultiplex PCR, especially in the context of a gene family, the TCR JβReverse PCR primers were designed to minimize overlap with thesequencing oligonucleotides. Thus, the 13 TCR Jβ reverse primers areanchored at the 3′ end on the consensus splice site motif, with minimaloverlap of the sequencing primers. The TCR Jβ primers were designed fora consistent annealing temperature (58 degrees in 50 mM salt) using theOligoCalc program under default parameters(http://www.basic.northwestern.edu/biotools/oligocalc.html).

The 45 TCR Vβ forward primers were designed to anneal to the Vβ segmentsin a region of relatively strong sequence conservation between Vβsegments, for two express purposes. First, maximizing the conservationof sequence among these primers minimizes the potential for differentialannealing properties of each primer. Second, the primers were chosensuch that the amplified region between V and J primers will containsufficient TCR Vβ sequence information to identify the specific Vβ genesegment used. This obviates the risk of erroneous TCR Vβ gene segmentassignment, in the event of promiscuous priming by the TCR Vβ primers.TCR Vβ forward primers were designed for all known non-pseudogenes inthe TCRβ locus.

The total PCR product for a successfully rearranged TCRβ CDR3 regionusing this system is expected to be approximately 200 bp long. Genomictemplates were PCR amplified using an equimolar pool of the 45 TCR Vβ Fprimers (the “VF pool”) and an equimolar pool of the thirteen TCR Jβ Rprimers (the “JR pool”). 50 μl PCR reactions were set up at 1.0 μM VFpool (22 nM for each unique TCR Vβ F primer), 1.0 μM JR pool (77 nM foreach unique TCRBJR primer), 1× QIAGEN Multiple PCR master mix (QIAGENpart number 206145), 10% Q-solution (QIAGEN), and 16 ng/ul gDNA. Thefollowing thermal cycling conditions were used in a PCR Express thermalcycler (Hybaid, Ashford, UK) under the following cycling conditions: 1cycle at 95° C. for 15 minutes, 25 to 40 cycles at 94° C. for 30seconds, 59° C. for 30 seconds and 72° C. for 1 minute, followed by onecycle at 72° C. for 10 minutes. 12-20 wells of PCR were performed foreach library, in order to sample hundreds of thousands to millions ofrearranged TCRβ CDR3 loci.

Example 4 Pre-Processing of Sequence Data

Sequencer data processing involves a series of steps to remove errors inthe primary sequence of each read, and to compress the data. First, acomplexity filter removes approximately 20% of the sequences which aremisreads from the sequencer. Then, sequences were required to have aminimum of a six base match to both one of the thirteen J-regions andone of 54 V-regions. Applying the filter to the control lane containingphage sequence, on average only one sequence in 7-8 million passed thesesteps without false positives. Finally, a nearest neighbor algorithm wasused to collapse the data into unique sequences by merging closelyrelated sequences, in order to remove both PCR error and sequencingerror (see Table 10).

Example 5 Estimating Relative CDR3 Sequence Abundance in PCR Pools andBlood Samples

After collapsing the data, the underlying distribution of T-cellsequences in the blood reconstructing were derived from the sequencedata. The procedure used three steps; 1) flow sorting T-cells drawn fromperipheral blood, 2) PCR amplification, and 3) sequencing. Analyzing thedata, the ratio of sequences in the PCR product must be derived workingbackward from the sequence data before estimating the true distributionof clonotypes in the blood.

For each sequence observed a given number of times in the data herein,the probability that that sequence was sampled from a particular sizePCR pool is estimated. Because the CDR3 regions sequenced are sampledrandomly from a massive pool of PCR products, the number of observationsfor each sequence are drawn from Poisson distributions. The Poissonparameters are quantized according to the number of T cell genomes thatprovided the template for PCR. A simple Poisson mixture model bothestimates these parameters and places a pairwise probability for eachsequence being drawn from each distribution. This is an expectationmaximization method which reconstructs the abundances of each sequencethat was drawn from the blood.

Example 6 Unseen Species Model for Estimation of True Diversity

A mixture model can reconstruct the frequency of each TCRβ CDR3 speciesdrawn from the blood, but the larger question is how many unique CDR3species were present in the donor? This is a fundamental question thatneeds to be answered as the available sample is limited in each donor,and will be more important in the future as these techniques areextrapolated to the smaller volumes of blood that can reasonably bedrawn from patients undergoing treatment.

The mathematical solution provides that for a total number of TCRβ“species” or clonotypes, S, a sequencing experiment observes x_(s)copies of sequence s. For all of the unobserved clonotypes, x_(s) equals0, and each TCR clonotype is “captured” in a blood draw according to aPoisson process with parameter λ_(s). The number of T cell genomessequenced in the first measurement 1, and in the second measurement.Since there are a large number of unique sequences, an integral willrepresent the sum. If G(λ) is the empirical distribution function of theparameters λ₁, . . . , λ_(s), and n_(x) is the number of clonotypessequenced exactly x times, then

${E\left( n_{x} \right)} = {S{\int_{0}^{\infty}{\left( \frac{^{- \lambda}\lambda^{x}}{x!} \right)\ {{{G(\lambda)}}.}}}}$

The value Δ(t) is the number of new clonotypes observed in the secondsequencing experiment.

${\Delta \; (t)} = {{{\sum\limits_{x}{E\left( n_{x} \right)}_{{\exp \; 1} + {\exp \; 2}}} - {\sum\limits_{x}{E\left( n_{x} \right)}_{\exp \; 1}}} = {S{\int_{0}^{\infty}{{^{- \lambda}\left( {1 - ^{{- \lambda}\; t}} \right)}\ {{G(\lambda)}}}}}}$

Taylor expansion of 1-e^(−λt) gives Δ(t)=E(x₁)t−E(x₂)t²+E(x₃)t³− . . . ,which can be approximated by replacing the expectations (E(n_(x))) withthe observed numbers in the first measurement. Using in the numbersobserved in the first measurement, this formula predicts that 1.6*10⁵new unique sequences should be observed in the second measurement. Theactual value of the second measurement was 1.8*10⁵ new TCRβ sequences,which implies that the prediction provided a valid lower bound on totaldiversity. An Euler's transformation was used to regularize Δ(t) toproduce a lower bound for Δ(∞).

Example 7 Error Correction and Bias Assessment

Sequence error in the primary sequence data derives primarily from twosources: (1) nucleotide misincorporation that occurs during theamplification by PCR of TCRβ CDR3 template sequences, and (2) errors inbase calls introduced during sequencing of the PCR-amplified library ofCDR3 sequences. The large quantity of data allows us to implement astraightforward error correcting code to correct most of the errors inthe primary sequence data that are attributable to these two sources.After error correction, the number of unique, in-frame CDR3 sequencesand the number of observations of each unique sequence were tabulatedfor each of the four flow-sorted T cell populations from the two donors.The relative frequency distribution of CDR3 sequences in the four flowcytometrically-defined populations demonstrated that antigen-experiencedCD45RO⁺ populations contained significantly more unique CDR3 sequenceswith high relative frequency than the CD45RO⁻ populations. Frequencyhistograms of TCRβ CDR3 sequences observed in four different T cellsubsets distinguished by expression of CD4, CD8, and CD45RO and presentin blood showed that ten unique sequences were each observed 200 timesin the CD4⁺CD45RO⁺ (antigen-experienced) T cell sample, which was morethan twice as frequent as that observed in the CD4⁺CD45RO⁻ populations.

The use of a PCR step to amplify the TCRβ CDR3 regions prior tosequencing could potentially introduce a systematic bias in the inferredrelative abundance of the sequences, due to differences in theefficiency of PCR amplification of CDR3 regions utilizing different Vβand Jβ gene segments. To estimate the magnitude of any such bias, theTCRβ CDR3 regions from a sample of approximately 30,000 uniqueCD4⁺CD45RO⁺ T lymphocyte genomes were amplified through 25 cycles ofPCR, at which point the PCR product was split in half. Half was setaside, and the other half of the PCR product was amplified for anadditional 15 cycles of PCR, for a total of 40 cycles of amplification.The PCR products amplified through 25 and 40 cycles were then sequencedand compared. Over 95% of the 25 cycle sequences were also found in the40-cycle sample: a linear correlation is observed when comparing thefrequency of sequences between these samples. For sequences observed agiven number of times in the 25 cycle lane, a combination of PCR biasand sampling variance accounts for the variance around the mean of thenumber of observations at 40 cycles. Conservatively attributing the meanvariation about the line (1.5-fold) entirely to PCR bias, each cycle ofPCR amplification potentially introduces a bias of average magnitude1.5^(1/15)=1.027. Thus, the 25 cycles of PCR introduces a total bias ofaverage magnitude 1.027²⁵=1.95 in the inferred relative abundance ofdistinct CDR3 region sequences.

Example 8 Jβ Gene Segment Usage

The CDR3 region in each TCR β chain includes sequence derived from oneof the thirteen J_(β) gene segments. Analysis of the CDR3 sequences inthe four different T cell populations from the two donors demonstratedthat the fraction of total sequences which incorporated sequencesderived from the thirteen different J_(β) gene segments varied more than20-fold. Jβ utilization among four different T flowcytometrically-defined T cells from a single donor is was relativelyconstant within a given donor. Moreover, the J_(β) usage patternsobserved in two donors, which were inferred from analysis of genomic DNAfrom T cells sequenced using the GA, are qualitatively similar to thoseobserved in T cells from umbilical cord blood and from healthy adultdonors, both of which were inferred from analysis of cDNA from T cellssequenced using exhaustive capillary-based techniques.

Example 9 Nucleotide Insertion Bias

Much of the diversity at the CDR3 junctions in TCR α and β chains iscreated by non-templated nucleotide insertions by the enzyme TerminalDeoxynucloetidyl Transferase (TdT). However, in vivo, selection plays asignificant role in shaping the TCR repertoire giving rise tounpredictability. The TdT nucleotide insertion frequencies, independentof selection, were calculated using out of frame TCR sequences. Thesesequences are non-functional rearrangements that are carried on oneallele in T cells where the second allele has a functionalrearrangement. The mono-nucleotide insertion bias of TdT favors C and G(Table 11).

TABLE 11 Mono-nucleotide bias in out of frame data A C G T Lane 1 0.240.294 0.247 0.216 Lane 2 0.247 0.284 0.256 0.211 Lane 3 0.25 0.27 0.2680.209 Lane 4 0.255 0.293 0.24 0.21

Similar nucleotide frequencies are observed in the in frame sequences(Table 12).

TABLE 12 Mono-nucleotide bias in in-frame data A C G T Lane 1 0.21 0.2850.275 0.228 Lane 2 0.216 0.281 0.266 0.235 Lane 3 0.222 0.266 0.2880.221 Lane 4 0.206 0.294 0.228 0.27

The N regions from the out of frame TCR sequences were used to measurethe di-nucleotide bias. To isolate the marginal contribution of adi-nucleotide bias, the di-nucleotide frequencies was divided by themononucleotide frequencies of each of the two bases. The measure is

$m = {\frac{f\left( {n_{1}n_{2}} \right)}{{f\left( n_{1} \right)}{f\left( n_{2} \right)}}.}$

The matrix for m is found in Table 13.

TABLE 13 Di-nucleotide odd ratios for out of frame data A C G T A 1.1980.938 0.945 0.919 C 0.988 1.172 0.88 0.931 G 0.993 0.701 1.352 0.964 T0.784 1.232 0.767 1.23

Many of the dinucleotides are under or over represented. As an example,the odds of finding a GG pair are very high. Since the codons GGNtranslate to glycine, many glycines are expected in the CDR3 regions.

Example 10 Amino Acid Distributions in the CDR3 Regions

The distribution of amino acids in the CDR3 regions of TCRβ chains areshaped by the germline sequences for V, D, and J regions, the insertionbias of TdT, and selection. The distribution of amino acids in thisregion for the four different T cell sub-compartments is very similarbetween different cell subtypes. Separating the sequences into β chainsof fixed length, a position dependent distribution among amino acids,which are grouped by the six chemical properties: small, special, andlarge hydrophobic, neutral polar, acidic and basic. The distributionsare virtually identical except for the CD8+ antigen experienced T cells,which have a higher proportion of acidic bases, particularly at position5.

Of particular interest is the comparison between CD8⁺ and CD4⁺ TCRsequences as they bind to peptides presented by class I and class II HLAmolecules, respectively. The CD8⁺ antigen experienced T cells have a fewpositions with a higher proportion of acidic amino acids. This could bedo binding with a basic residue found on HLA Class I molecules, but noton Class II.

Example 11 TCR β Chains with Identical Amino Acid Sequences Found inDifferent People

The TCR β chain sequences were translated to amino acids and thencompared pairwise between the two donors. Many thousands of exactsequence matches were observed. For example, comparing the CD4⁺CD45RO⁻sub-compartments, approximately 8,000 of the 250,000 unique amino acidsequences from donor 1 were exact matches to donor 2. Many of thesematching sequences at the amino acid level have multiple nucleotidedifferences at third codon positions. Following the example mentionedabove, 1,500/8,000 identical amino acid matches had >5 nucleotidemismatches. Between any two T cell sub-types, 4-5% of the unique TCRβsequences were found to have identical amino acid matches.

Two possibilities were examined: that 1) selection during TCRdevelopment is producing these common sequences and 2) the large bias innucleotide insertion frequency by TdT creates similar nucleotidesequences. The in-frame pairwise matches were compared to theout-of-frame pairwise matches (see Examples 1-4, above). Changing framespreserved all of the features of the genetic code and so the same numberof matches should be found if the sequence bias was responsible for theentire observation. However, almost twice as many in-frame matches asout-of-frame matches were found, suggesting that selection at theprotein level is playing a significant role.

To confirm this finding of thousands of identical TCR β chain amino acidsequences, two donors were compared with respect to the CD8⁺CD62L⁺CD45RA⁺ (naïve-like) TCRs from a third donor, a 44 year old CMV⁺Caucasian female. Identical pairwise matches of many thousands ofsequences at the amino acid level between the third donor and each ofthe original two donors were found. In contrast, 460 sequences wereshared between all three donors. The large variation in total number ofunique sequences between the donors is a product of the startingmaterial and variations in loading onto the sequencer, and is notrepresentative of a variation in true diversity in the blood of thedonors.

Example 12 Higher Frequency Clonotypes are Closer to Germline

The variation in copy number between different sequences within every Tcell sub-compartment ranged by a factor of over 10,000-fold. The onlyproperty that correlated with copy number was (the number of insertionsplus the number of deletions), which inversely correlated. Results ofthe analysis showed that deletions play a smaller role than insertionsin the inverse correlation with copy number.

Sequences with less insertions and deletions have receptor sequencescloser to germ line. One possibility for the increased number ofsequences closer to germ line is that they are the created multipletimes during T cell development. Since germ line sequences are sharedbetween people, shared TCRβ chains are likely created by TCRs with asmall number of insertions and deletions.

Example 13 “Spectratype” Analysis of TCRβ CDR3 Sequences by V GeneSegment Utilization and CDR3 Length

TCR diversity has commonly been assessed using the technique of TCRspectratyping, an RT-PCR-based technique that does not assess TCR CDR3diversity at the sequence level, but rather evaluates the diversity ofTCRα or TCRβ CDR3 lengths expressed as mRNA in subsets of αβ T cellsthat use the same V_(α) or V_(β) gene segment. The spectratypes ofpolyclonal T cell populations with diverse repertoires of TCR CDR3sequences, such as are seen in umbilical cord blood or in peripheralblood of healthy young adults typically contain CDR3 sequences of 8-10different lengths that are multiples of three nucleotides, reflectingthe selection for in-frame transcripts. Spectratyping also providesroughly quantitative information about the relative frequency of CDR3sequences with each specific length. To assess whether direct sequencingof TCRβ CDR3 regions from T cell genomic DNA using the sequencer couldfaithfully capture all of the CDR3 length diversity that is identifiedby spectratyping, “virtual” TCRβ spectratypes (see Examples above) weregenerated from the sequence data and compared with TCRβ spectratypesgenerated using conventional PCR techniques. The virtual spectratypescontained all of the CDR3 length and relative frequency informationpresent in the conventional spectratypes. Direct TCRβ CDR3 sequencingcaptures all of the TCR diversity information present in a conventionalspectratype. A comparison of standard TCRβ spectratype data andcalculated TCRβ CDR3 length distributions for sequences utilizingrepresentative TCR Vβ gene segments and present in CD4⁺CD45RO⁺ cellsfrom donor 1. Reducing the information contained in the sequence data toa frequency histogram of the unique CDR3 sequences with differentlengths within each Vβ family readily reproduces all of the informationcontained in the spectratype data. In addition, the virtual spectratypesrevealed the presence within each V_(β) family of rare CDR3 sequenceswith both very short and very long CDR3 lengths that were not detectedby conventional PCR-based spectratyping.

Example 14 Estimation of Total CDR3 Sequence Diversity

After error correction, the number of unique CDR3 sequences observed ineach lane of the sequencer flow cell routinely exceeded 1×10⁵. Giventhat the PCR products sequenced in each lane were necessarily derivedfrom a small fraction of the T cell genomes present in each of the twodonors, the total number of unique TCRβ CDR3 sequences in the entire Tcell repertoire of each individual is likely to be far higher.Estimating the number of unique sequences in the entire repertoire,therefore, requires an estimate of the number of additional unique CDR3sequences that exist in the blood but were not observed in the sample.The estimation of total species diversity in a large, complex populationusing measurements of the species diversity present in a finite samplehas historically been called the “unseen species problem” (see Examplesabove). The solution starts with determining the number of new species,or TCRβ CDR3 sequences, that are observed if the experiment is repeated,i.e., if the sequencing is repeated on an identical sample of peripheralblood T cells, e.g., an identically prepared library of TCRβ CDR3 PCRproducts in a different lane of the sequencer flow cell and counting thenumber of new CDR3 sequences. For CD8⁺CD45RO⁻ cells from donor 2, thepredicted and observed number of new CDR3 sequences in a second lane arewithin 5% (see Examples above), suggesting that this analytic solutioncan, in fact, be used to estimate the total number of unique TCRβ CDR3sequences in the entire repertoire.

The resulting estimates of the total number of unique TCRβ CDR3sequences in the four flow cytometrically-defined T cell compartmentsare shown in Table 14.

TABLE 14 TCR repertoire diversity Donor CD8 CD4 CD45RO Diversity 1 + − +6.3 * 10⁵ + − − 1.24 * 10⁶  − + + 8.2 * 10⁵ − + − 1.28 * 10⁶  Total Tcell diversity 3.97 * 10⁶  2 + − + 4.4 * 10⁵ + − − 9.7 * 10⁵ − + + 8.7 *10⁵ − + − 1.03 * 10⁶  Total T cell diversity 3.31 * 10⁶ 

Of note, the total TCRβ diversity in these populations is between 3-4million unique sequences in the peripheral blood. Surprisingly, theCD45RO⁺, or antigen-experienced, compartment constitutes approximately1.5 million of these sequences. This is at least an order of magnitudelarger than expected. This discrepancy is likely attributable to thelarge number of these sequences observed at low relative frequency,which could only be detected through deep sequencing. The estimated TCRβCDR3 repertoire sizes of each compartment in the two donors are within20% of each other.

The results herein demonstrate that the realized TCRβ receptor diversityis at least five-fold higher than previous estimates (˜4*10⁶ distinctCDR3 sequences), and, in particular, suggest far greater TCRβ diversityamong CD45RO⁺ antigen-experienced αβ T cells than has previously beenreported (˜1.5*10⁶ distinct CDR3 sequences). However, bioinformaticanalysis of the TCR sequence data shows strong biases in the mono- anddi-nucleotide content, implying that the utilized TCR sequences aresampled from a distribution much smaller than the theoretical size. Withthe large diversity of TCRβ chains in each person sampled from aseverely constrict space of sequences, overlap of the TCR sequence poolscan be expected between each person. In fact, the results showed about5% of CD8⁺ naïve TCRβ chains with exact amino acid matches are sharedbetween each pair of three different individuals. As the TCRα pool hasbeen previously measured to be substantially smaller than thetheoretical TCRβ diversity, these results show that hundreds tothousands of truly public αβ TCRs can be found.

1. A method, comprising: generating a plurality of amplicons comprisingrearranged TCR or IG CDR3 nucleic acid sequences from a first samplecomprising nucleic acid molecules obtained from lymphocytes from amammalian subject, using a plurality of V-segment primers and aplurality of J-segment primers in a single multiplex PCR, wherein eachV-segment primer is complementary to one or more functional TCR or IGV-region gene segments, wherein each J-segment primer is complementaryto one or more functional TCR or IG J-region gene segments, wherein saidplurality of V-segment primers and said plurality of J-segment primersare capable of amplifying in said single multiplex PCR at least 10⁴distinct amplicons representing a diversity of rearranged TCR or IG CDR3nucleic acid sequences present in said first sample; sequencing saidplurality of amplicons; and determining a total diversity of rearrangedTCR or IG CDR3 nucleic acid sequences from said plurality of sequencedamplicons of said first sample.
 2. The method of claim 1, wherein saiddetermining comprises quantifying said plurality of sequenced ampliconsgenerated in said single multiplex PCR.
 3. The method of claim 2,wherein said quantifying comprises quantifying the number of distinctrearranged TCR or IG CDR3 nucleic acid sequences from said first sample.4. The method of claim 2, wherein said determining comprises estimatinga total diversity of rearranged TCR or IG CDR3 nucleic acid sequences.5. The method of claim 1, wherein said total diversity comprises atleast 10⁴ distinct rearranged TCR or IG CDR3 nucleic acid sequences. 6.The method of claim 1, wherein said total diversity comprises at least10⁵ distinct rearranged TCR or IG CDR3 nucleic acid sequences.
 7. Themethod of claim 1, wherein said total diversity comprises at least 10⁶distinct rearranged TCR or IG CDR3 nucleic acid sequences.
 8. The methodof claim 1, wherein said nucleic acid molecules comprise genomic DNA. 9.The method of claim 1, wherein said nucleic acid molecules comprisecDNA.
 10. The method of claim 1, wherein said nucleic acid moleculescomprise mRNA.
 11. The method of claim 1, wherein sequencing compriseshigh-throughput sequencing of said plurality of amplicons using a set ofsequencing oligonucleotides that hybridize to a defined region in eachof said plurality of amplicons.
 12. The method of claim 1, wherein saidTCR V-region gene segment comprises a TCR Vδ segment and wherein saidTCR J-region gene segment comprises a TCR Jδ segment.
 13. The method ofclaim 1, wherein said TCR V-region gene segment comprises a TCR Vγsegment and wherein said TCR J-region gene segment comprises a TCR Jγsegment.
 14. The method of claim 1, wherein said TCR V-region genesegment comprises a TCR Vα segment and wherein said TCR J-region genesegment comprises a TCR Jα segment.
 15. The method of claim 1, whereinsaid TCR V-region gene segment comprises a TCR Vβ segment and whereinsaid TCR J-region gene segment comprises a TCR Jβ segment.
 16. Themethod of claim 1, wherein said Ig V-region gene segment comprises anIGH V gene segment and wherein said Ig J-region gene segment comprisesan IGH J gene segment.
 17. The method of claim 1, wherein said IgV-region gene segment comprises an IGL V gene segment and wherein saidIg J-region gene segment comprises an IGL J gene segment.
 18. The methodof claim 1, wherein said Ig V-region gene segment comprises an IGK Vgene segment and wherein said Ig J-region gene segment comprises an IGKJ gene segment.
 19. The method of claim 1, wherein each of said distinctrearranged TCR or IG CDR3 nucleic acid sequences spans a V-D-J or V-Jjunction.
 20. The method of claim 1, further comprising: generating aplurality of amplicons comprising rearranged TCR or IG CDR3 nucleic acidmolecules from a second sample comprising nucleic acid moleculesobtained from lymphocytes from a mammalian subject, using said pluralityof V-segment primers and said plurality of J-segment primers in a singlemultiplex PCR; sequencing said plurality of amplicons; and determining atotal diversity of rearranged TCR or IG CDR3 nucleic acid sequences fromsaid plurality of sequenced amplicons of said second sample.
 21. Themethod of claim 20, wherein said first sample and said second sample arefrom the same mammalian subject.
 22. The method of claim 21, whereinsaid first sample is obtained from said mammalian subject before atreatment and said second sample is obtained from said mammalian subjectafter said treatment.
 23. The method of claim 21, further comprisingcomparing the total diversity of rearranged TCR or IG CDR3 nucleic acidsequences of said first sample with the total diversity of rearrangedTCR or IG CDR3 nucleic acid sequences of said second sample.
 24. Themethod of claim 23, further comprising assessing an effect of atreatment based on differences in measurements of diversity between saidfirst and second samples.
 25. The method of claim 23, further comprisingassessing an immunocompetence of said subject based on differences inmeasurements of diversity between said first and second samples.